Towards Full Automation Of Lexicon Construction

نویسندگان

  • Richard Rohwer
  • Dayne Freitag
چکیده

We describe work in progress aimed at developing methods for automatically constructing a lexicon using only statistical data derived from analysis of corpora, a problem we call lexical optimization. Specifically, we use statistical methods alone to obtain information equivalent to syntactic categories, and to discover the semantically meaningful units of text, which may be multi-word units or polysemous terms-incontext. Our guiding principle is to employ a notion of “meaningfulness” that can be quantified information-theoretically, so that plausible variants of a lexicon can be judged relative to each other. We describe a technique of this nature called information theoretic co-clustering and give results of a series of experiments built around it that demonstrate the main ingredients of lexical optimization. We conclude by describing our plans for further improvements, and for applying the same mathematical principles to other problems in natural language processing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On-Line Character Analysis and Recognition With Fuzzy Neural Networks

A new recognition system based on a neuro-fuzzy system, called FasArt, is proposed in this paper. Satisfactory results were obtained using the train_r01_v02 UNIPEN dataset, together with a comparison with the recognition rates achieved by independent human testers. Two methods for segmenting handwritten components into strokes are proposed, with better experimental results for the method based ...

متن کامل

ITRI-00-37 Semi-automatic construction of multilingual lexicons

The construction of lexicons for NLP applications is a potentially very expensive task, but a crucially important one, especially in multilingual applications. The automation of the task from generic data sources or corpora is as yet largely impractical for most applied systems. In this paper we describe a methodology for the semi-automation of the task, used in the CLIME project to develop bil...

متن کامل

Lexicon Based Ontology Construction

Researchers from industry and academia are now exploring the possibility of creating a "Semantic Web," in which meaning is made explicit, allowing machines to process and integrate Web resources intelligently. This technology will allow interoperability among development of intelligent internet agents in large scale, facilitating communication between a multitude of heterogeneous web-accessible...

متن کامل

3-D Graphical Visualization for Construction Automation

The availability of low-cost, high performance computers that are capable of real-time 3-D graphic simulation has lead to a plethora of applications in the construction industry. This technology is particularly beneficial in the design and simulation of automated construction systems. The expense of physically constructing and implementing a full-scale prototype automated construction systems h...

متن کامل

A Proposed “model for Adoption” of High Technology Products (robots) for Indian Construction Industry

Construction industry is considered as labour intensive, having shortage of skilled labour, unsafe with large number of industrial accidents. Construction industry requires high technology automation products (Robots) for improving productivity, safety, quality etc. Robots are developed by various countries in different areas like demolition, earthwork, bridge, tunnels, road work, underwater wo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004